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Abstract 

We consider the following sequential decision 
problem. Given a set of items of unknown 
utility, we need to select one of as high a 
utihty as possible ( "the selection problem" ) . 
Measurements (possibly noisy) of item val- 
ues prior to selection are allowed, at a known 
cost. The goal is to optimize the overall 
sequential decision process of measurements 
and selection. 

Value of information (VOI) is a well-known 
scheme for selecting measurements, but the 
intractability of the problem typically leads 
to using myopic VOI estimates. In the 
selection problem, myopic VOI frequently 
badly underestimates the value of informa- 
tion, leading to inferior sensing plans. We 
relax the strict myopic assumption into a 
scheme we term semi- myopic, providing a 
spectrum of methods that can improve the 
performance of sensing plans. In particu- 
lar, we propose the efficiently computable 
method of "blinkered" VOI, and examine 
theoretical bounds for special cases. Empiri- 
cal evaluation of "blinkered" VOI in the selec- 
tion problem with normally distributed item 
values shows that is performs much better 
than pure myopic VOI. 

1 INTRODUCTION 

Decision-making under uncertainty is a domain with 
numerous important applications. Since these prob- 
lems are intractable in general, special cases are of 
interest. In this paper, we examine the selection prob- 
lem: given a set of items of unknown utility (but a 
distribution of which is known), we need to select an 
item with as high a utility as possible. Measurements 
(possibly noisy) of item values prior to selection are 



allowed, at a known cost. The problem is to optimize 
the overall decision process of measurement and selec- 
tion. Even with the severe restrictions imposed by the 
above setting, this decision problem is intractable [5]; 
and yet it is important to be able to solve, at least 
approximately, as it has several potential applications, 
such as sensor network planning, and oil exploration. 

Other settings where this problem is applicable are in 
considering which time-consuming deliberation steps 
to perform (meta-reasoning \W) before selecting an ac- 
tion, locating a point of high temperature using a lim- 
ited number of measurements (with dependencies be- 
tween locations as in [5]), and the problem of finding 
a good set of parameters for setting up an industrial 
imaging system. The latter is actually the original 
motivation for this research, and is briefly discussed in 
section 14.21 

A widely adopted scheme for selecting measurements, 
also called sensing actions in some contexts (or deliber- 
ation steps, in the context of meta-reasoning) is based 
on value of information (VOI) [9^. Computing value 
of information is intractable in general, thus both re- 
searches and practitioners often use various forms of 
myopic VOI estimates [9J coupled with greedy search. 
Even when not based on solid theoretical guarantees, 
such estimates lead to practically efficient solutions in 
many cases. 

However, in a selection problem involving real- valued 
items, the main focus of this paper, coupled with the 
capability of the system to perform more than one 
measurement for each item, the myopic VOI estimate 
can be shown to badly underestimate the value of in- 
formation. This can lead to inferior measurement se- 
quences, due to the fact that in many cases no mea- 
surement is seen to have a VOI estimate greater than 
its cost, due to the myopic approximation. Our goal 
is to find a scheme that, while still efficiently com- 
putable, can overcome this limitation of myopic VOI. 
We propose the framework of semi-myopic VOI, which 
includes the myopic VOI as the simplest special case, 



but also much more general schemes such as measure- 
ment batching, and exhaustive subset selection at the 
other extreme. Within this framework we propose the 
"blinkered" VOI estimate, a variant of measurement 
batching, as one that is efficiently computable and yet 
performs much better than myopic VOI for the selec- 
tion problem. 

The rest of the paper is organized as follows. We be- 
gin with a formal definition of our version of the se- 
lection problem and other preliminaries. We then ex- 
amine a pathological case of myopic VOI, and present 
our framework of semi-myopic VOI. The "blinkered" 
VOI is then defined as scheme within the framework, 
followed by theoretical bounds for some simple spe- 
cial cases. Empirical results, comparing different VOI 
schemes to blinkered VOI, further support using this 
cheme in the selection problem. We conclude with a 
discussion of closely related work. 

2 BACKGROUND 

We begin by formally defining the selection problem, 
followed by a description of the standard myopic VOI 
approach for solving it. 

2.1 The Optimization Problem 

The selection problem is defined as follows. Given 



m' 

U"^' = u{x,)-J2c{M,) (1) 
i=i 

We assume that the posterior beliefs p^(si — xi, S2 = 
X2, Sn — Xn\y); and the marginal posterior beliefs 
p+(si = x\y) about an item value can be computed 
efficiently. More specifically, we usually represent the 
distribution using a structured probability model, such 
as a Bayes network or Markov network. The assump- 
tion is that either the structure or distribution of the 
network is such that belief updating is easy, that the 
network is sufficiently small, or that the network is 
such that an approximate and efficient belief-updating 
algorithm (such as loopy belief updating) provides a 
good approximation. Observe that this assumption 
does not make the selection problem tractable, as even 
in a chain-structured network the selection problem is 
NP-hard [5]. In fact, even when assuming that the 
beliefs about items are independent (as we will do for 
much of the sequel), the problem is still hard. 

2.2 Limited Rationality Approach 

In its most general setting, the selection problem can 
be modeled as a (continuous state) indefinite-horizon 
POMDP [T], which is badly intractable. Following 0, 
we thus use a greedy scheme that: 



• A set S = {si, 32, Sn} of n items; 

• initial beliefs (probability distribution) about 
item values Bel{S) — p(si = xi, s„ = a;„); 

• utility function u : TZ ^ TZ; 

• a cost function c : N ^ TZ defining the cost of a 
single measurement of item i; 

• a budget or maximum allowed number of mea- 
surements m; 

• a measurement model, i.e. a probability distribu- 
tion of observation outcome given the true value 
of each item po{y\x); 

find a policy that maximizes expected net utility 
of the selection — utility of the selected item less the 
cost of the measurements. Although in practice we 
allow different types of measurements (i.e. with differ- 
ent cost and measurement error model) we assume ini- 
tially for simplicity that all measurements of an item 
are i.i.d. given the item value. Thus, if item Si is 
chosen after measurement sequence (Mi, Mm'), and 
the true value of Si is Xi , the net result f/"*'* is: 



• Chooses at each step a measurement with the 
greatest value of information (VOI), performing 
belief updating after getting the respective obser- 
vation, 

• stops when no measurement with a positive VOI 
exists, and 

• selects item Sa with the greatest expected utility: 

oo 

E{U^)= [ p^{x)u{x)dx (2) 



VOI of a measurement Mj is defined as follows: de- 
note by E{Ul^) the expected net utility of item Si 
after measurement j and a subsequent optimal mea- 
surement plan. Let s^- be the item that currently has 
the greatest expected net utihty E{Ua-)- Likewise, 
let s^j^ be the item with the greatest expected util- 
ity E(U^f+) after a measurement plan beginning with 
observation j. Then: 

VOI{M,)^E(uit+)-E(uit) (3) 



2.3 Myopic Value of Information Estimate 

Computing value of information of a measurement is 
intractable, and is thus usually estimated instead un- 
der the assumptions of meta- greediness and single-step. 
Under these assumptions, the myopic scheme consid- 
ers only one measurement step, and ignores value of 
later measurements. A measurement step can consist 
of a single measurement or of a fixed number thereof 
with no deliberation between the measurements. 

A measurement is beneficial only if it changes which 
item appears to have the greatest estimated expected 
utility. For items that are mutually independent (es- 
sentially the subtree independence assumption of ^), 
a measurement only affects beliefs about the measured 
item. If the measured item does not seem to be the 
best now, but can become better than the current best 
item a when the belief is updated, the benefit in this 
case is: 



B,(y)=max / u{x)p+{x\y)dx - E{U^),0 ] (4) 



If the measured item a can become worse than the 
next-to-best item /3, the benefit is: 



(oo 
E{U0)~ J u{x)p+ 
— OO 



{x\y)dx,0\ (5) 



where y is the observed outcome, and p'^{x\y) is the 
posterior belief. 

For these two cases, the myopic VOI estimate MVI of 
observing the ith item at step j of the algorithm is: 



Pi i^) / B,{y)po{y\x)dydx - c{j) (6) 



MVL 



2.4 Myopic Scheme: Shortcomings 

The decisions the myopic scheme makes at each step 
are: which measurement is the most valuable, and 
whether the most valuable measurement has a positive 
value. The simplifying assumptions are justified when 
they lead to decisions sufficiently close to optimal in 
terms of the performance measure, the net utility. 

The first decision controls search "direction" . When 
it is wrong, a non-optimal item is measured, and thus 
more measurements are done before arriving at a final 



decision. The net utility decreases due to the higher 
costs. The second decision determines when the al- 
gorithm terminates, and can be erroneously made too 
late or too early. Made too late, it leads to the same 
deficiency as above: the measurement cost is too high. 
Made too early, it causes early and potentially incor- 
rect item selection, due to wrong beliefs. The net util- 
ity decreases because the item's utility is low. 

In terms of value of information, the assumptions lead 
to correct decisions when the value of information is 
negative for every sequence of steps, or if there exists 
a measurement that according to the meta-greedy ap- 
proach has the greatest (positive) VOI estimate, and 
is the head of an optional measurement plan. These 
criteria are related to the notion of non-increasing re- 
turns; the assumptions are based on an implicit hy- 
pothesis that the intrinsic value grows more slowly 
than the cost. When the hypothesis is correct, the 
assumptions should work well; otherwise, the myopic 
scheme either gets stuck or goes the wrong way. 

It is commonly believed that many processes exhibit 
diminishing returns; the law of diminishing returns is 
considered a universal law in economics [4]. However, 
this only holds asymptotically: while it is often true 
that starting with some point in time the returns never 
grow, until that point they can alternate between in- 
creases and decreases. Sigmoid-shaped returns were 
discovered in marketing [3]. As experimental results 
show |13j , they are not uncommon in sensing and plan- 
ning. In such cases, an approach that can deal with 
increasing returns must be used. 

2.5 Pathological Example 

Let us examine a simple example where the myopic 
estimate behaves poorly: 

• S" is a set of two items, si and S2, where the value 
of si is known exactly, a;i = 1; 

• the prior belief about the value of S2 is a normal 
distribution P2{x) = N{x; 0, 1); 

• the observation variance is CTq = 5; 

• the observation cost is constant, and chosen so 
that the net value estimate of a two-observation 
step is zero: c(j) = w 0.00144; 



the utility is a step function: 
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The plot in Figure [T] is computed according to belief 
update formulas for normally distributed beliefs and 
presents dependency of the intrinsic value estimate 
on the number of observations in a single step. The 
straight line corresponds to the measurement costs. 
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Figure 1: Intrinsic value and measurement cost 

Under these conditions, the algorithm with one mea- 
surement per step will terminate without gathering ev- 
idence because the value estimate of the first step is 
negative, and will return item si as best. However, ob- 
serving S2 several times in a row has a positive value, 
and the updated expected utility of S2 can eventually 
become greater than u{si). Figure [T| also shows the 
intrinsic value growth rate as a function of the num- 
ber of measurements: it increases up to a maximum 
at 3 measurements, and then goes down. Apparently, 
the myopic scheme does not "see" as far as the initial 
increase. 

3 SEMI-MYOPIC VOI ESTIMATES 

The above pathological example was inspired by a phe- 
nomenon we encountered in a real-world problem, op- 
timizing parameter in setups of imaging machines. On 
data with varying prior beliefs, the myopic scheme al- 
most never measured an item with high prior variance 
even when it was likely to become the best item after 
a sufficient number of measurements. 

Keeping the complexity manageable (the number 
of possible sensing plans over continuous variables 
is uncountably infinite, in addition to being multi- 
dimensional) while overcoming the premature termi- 
nation is the basis for the semi-myopic value of infor- 
mation estimate. Consider the set of all possible mea- 
surement actions A4. Let C be a constraint over sets 
of measurements from A4. In the semi- myopic frame- 
work, we consider all possible subsets B of measurents 
from Ai that obey the constraint C, and for each such 
subset B compute a "batch" VOI that assumes that 
all measurements in B are made, followed by a deci- 
sion (selection of an item in our case) . Then, the batch 



B* with best estimated VOI is chosen. Once the best 
batch B* is chosen, there are still several options: 

1. Actually do all the measurements in B*. 

2. Attempt to optimize B* into some form of con- 
ditional plan of measurements and the resulting 
observations. 

3. Perform the best measurement in _B*. 

In all cases, after measurements are made, the selec- 
tion is repeated, until no batch has a positive net VOI, 
at which point the algorithm terminates and selects an 
item. Although we have experimented with option 1 
for comparative purposes, we did not further pursue it 
as empirical performance was poor, and opted for op- 
tion 3 in this papei[^ Observe that the constraint C is 
crucial. For an empty constraint (called the exhaustive 
scheme), all possible measurement sets are considered. 
Note that this has an exponential computational cost, 
while still not guaranteeing optimality. At the other 
extreme, the constraint is that only singleton sets be 
allowed, resulting in the greedy single-step assump- 
tion, which we call the pure myopic scheme. 

The myopic estimate can be extended to the case when 
a single step consists of fc > 1 measurements of a 
single item Si. We denote the estimate of such a k- 
measurements step by MVI^. Our main contribu- 
tion is thus the case where the constraint is that all 
measurements be for the same item, which we call the 
blinkered scheme. Yet another scheme we examine is 
where the constraint allows at most one measurement 
for each item (thus allowing from zero to n measure- 
ments in a batch), called the omni-myopic scheme. 

3.1 Blinkered Value of Information 

As stated above, the blinkered scheme considers sets of 
measurements that are all for the same item; this con- 
stitutes unlimited lookahead, but along a single "di- 
rection" , as if we "had our blinkers on" . That is, the 
VOI is estimated for any number of independent ob- 
servations of a single item. Although this scheme has a 
significant computational overhead over pure-myopic, 
the factor is only linear in the maximum budget. 
We define the "blinkered" value of information as: 

^Option 2 is intractable in general, and while limited 
efficient optimization may be possible, tfiis issue is beyond 
tlie scope of this paper. 

^Tliis complexity assumes eitfier normal distributions, 
or some other distribution tliat has compact statistics. For 
general distributions, sets of observations may provide in- 
formation beyond the mean and variance, and the resources 
required to compute VOI may even be exponential in the 
number of measurements. 



BVI = max MV I'' 

k 



(7) 



Driven by this estimate, the bhnkered scheme selects 

a single mcasTircment of the item where some num- 
ber of measurements gain the greatest VOL A single 
step is expected not to achieve the value, but to be 
just the first one in the right direction. Thus, the es- 
timate relaxes the single-step assumption, while still 
underestimating the value of information. 

For finite budget m the time to compute the estimate 
Tbvi is: Tbvi = O (TMVim). If MVI^ is a unimodal 
function of k, which can be shown for some forms of 
distributions and utility functions, the time is only log- 
arithmic in m using bisection search. 

3.2 Theoretical Bounds 

We estabUsh bounds on the blinkered scheme for two 
special cases, beginning with the termination condition 

for the case of only 2 items. 

Theorem 1. Let S = {si, S2}, where the value of si is 
known exactly. Let mb be the remaining budget of mea- 
surements when the blinkered scheme terminates, and 
C be the (positive) cost of each measurement. Then the 
( expected) value of information that can be achieved by 
an optimal policy from this point onwards is at most 
mbC. 

Proof. The intrinsic value of information of the re- 
maining budget when the blinkered scheme terminates 
is y™* < TOfjC, since otherwise it would not have ter- 
minated. Since there is only one type of measurement, 
the intrinsic value of information V^"* achieved by an 
optimal policy must be at most equal to making all 
the measurements, thus VJ"* < Vj,™* < rribC. Since 
measurement costs are positive, the net value of in- 
formation V^^* achieved by the optimal policy must 
therefore also be at most rubC. □ 

This bound is asymptotically tight. This can be shown 

by having a measurement model with dependencies 
such that the first measurements do not change the 
expected utilities of the item, but provide information 
on whether it is worthwhile to perform additional mea- 
surements. 

The second bound is related to a common case of a fi- 
nite budget and free measurements. It provides certain 

performance guarantees for the case when dependen- 
cies between items are sufficiently weak. 

Definition. Measurements of two items si, S2 are 
mutually submodular if, given sets of measurements 
of each item, Mi and M2, the intrinsic value of infor- 
mation of the union of the set is not greater than the 



sum of the intrinsic values of each of the sets, i.e.: 
y^"*(Mi U M2) < + V''"\M2) 

Theorem 2. For a set of n items, measurement cost 
C = 0, and a finite budget ofm measurements, ifmea- 
surem,ents of every two items are mutually submodu- 
lar, the value of information collected by the blinkered 
scheme is no more than a factor of n worse than the 
value of information collected by an optimal measure- 
ment plan. 

Proof. Since the measurement cost is 0, = y^nt^ 
we omit the superscript in the proof. Expected value 
of information cannot decrease by making additional 
measurements, therefore the value of any set of mea- 
surements containing the set of measurements in an 
optimal plan is at least as high as the value 14 of an 
optimal plan. Consider an exhaustive plan containing 
m measurements of each of the n items, mn measure- 
ments total with value Ve. The exhaustive plan con- 
tains all measurements that can be made by optimal 
plan within the budget, thus V,, <Ve. 

Let Si be the item with the highest bHnkered value 
for m measurements, denote its value by Vbimax = 
maxjVb^i. Since measurements of different items 
are mutually submodular, 14 < nVbimaxi and thus 

The blinkered scheme selects at every step a measure- 
ment from a plan with value of information which is 
at least as high as the value of measurements of Sj for 
the rest of the budget. Thus, its value of information 



Vb > Vbr. 



> 



Yo 



□ 



The bound is asymptotically tight. Construct a prob- 
lem instance as follows: n items, with expected values 
of information Vj(i), for i measurements, respectively, 
defined below. Value of information of a combination 
of the measurements is the sum of the values for each 
item, v{i) = with the following value of 

information functions: 



v\(i) 



ifz<f 
\l ^ otherwise 



Here, the optimal policy is to measure each item ^ 
times. The resulting value of information for to mea- 
surements is: 



Vo = vi + (n - l)w,>i ( 



n J V n 



lim n 

k — *oo 



But the blinkered algorithm will always choose the first 
item with VJ, — vi^m) — 1. 



the parameters. In the first experiment (Table [T]), the 
average regret of the myopic scheme compared to the 
blinkered scheme is 0.15 with standard deviation 0.1. 
In the second experiment (Table |2|, the regret is 0.27 
with standard deviation 0.19. 



4 EMPIRICAL EVALUATION 

It is rather diflicult to perform informative exper- 
iments on a real setting of the selection problem. 
Therefore, other than one case coming from a parame- 
ter selection application, empirical results in this paper 
are for artificially generated data. 

4.1 Simulated Runs 

The first set of experiments is for independent nor- 
mally distributed items. For a given number of items, 
we randomly generate their exact values and initial 
prior beliefs. Then, for a range of measurement costs, 
budgets, and observation precisions, we run the search, 
randomly generating observation outcomes according 
to the exact values of the items and the measurement 
model. The performance measure is the regret - the 
difference between the utility of the best item and the 
utility of the selected item, taking into account the 
measurement costs. We examine the result of using 
the blinkered scheme vs. other semi-myopic schemes. 

The first comparison is the difference in regret between 
the myopic and blinkered schemes, done for 2 items, 
one of which has a known value (Table [ij . A positive 
values in the cells, indicates an improvement due to 
the blinkered estimate. Note that the absolute value 
is bounded by 0.5, the difference in the utility of the 
exactly known item and the extremal utility. 
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Table 2: 4 items, 10 measurement budget 

Averaged over 100 runs of the experiment, the differ- 
ence is significantly positive for most combinations of 



4.1.1 Other Semi- Myopic Estimates 

In this set of experiments, we compare three semi- 
myopic schemes: blinkered, omni-myopic, and exhaus- 
tive. All schemes were run on a set of 5 items with a 
10 measurement budget. The results show that while 
blinkered is significantly better than pure myopic (Ta- 
ble [3]), exhaustive is only marginally better than blink- 
ered (Table |4]) , even though it requires evaluating an 
exponential number of sets of measurements. Another 
semi-myopic scheme, omni-myopic, is only marginally 
better than myopic (Table [5|. 
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Table 5: myopic vs. omni-myopic 



4.1.2 Dependencies between Items 

When the values of the items are linearly dependent, 
e.g. when: Xi = a^i-i + w with w being a random 
variable distributed as A^(0, cr^), the VOI of series of 
observations of several items can be greater than the 
sum of VOI of each observation. We examine the in- 
fluence of dependencies on the relative quality of the 
blinkered and omni-myopic schemes. 

^2 

When there are no dependencies, i.e. ^ = 0, the 
bhnkered scheme is significantly better. But as 



decreases, the omni-myomic estimate performs better. 
Figure [2] shows the difference between achieved utiHty 
of the bhnkercd and the myopic schemes with depen- 
dencies. The experiment was run on a set of 5 items, 
with the prior estimate iV(0,l), measurement preci- 
sion (Tq = 4, measurement cost C — 0.002 and a bud- 
get of 10 measurements. The results are averaged over 
100 runs of the experiment. 
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Figure 2: Influence of dependencies 



In the absence of dependencies, the omni-myopic algo- 
rithm does not perform measurements and chooses an 
item at random, thus performing poorly. As the de- 
pendencies become stronger, the omni-myopic scheme 
collects evidence and eventually outperforms the blink- 
ered scheme. In the experiment, the omni-myopic 
scheme begins to outperform the blinkered scheme 
when dependencies between the items are roughly half 
as strong as the measurement accuracy. The experi- 
mental results thus encourage the use of the blinkered 
value of information estimate in problems with increas- 
ing returns for certain combinations of parameters and 
weak dependencies between the items. 

4.2 Results on Real Data 

Due to lack of space, we only outline some main 
aspects of an additional application of the selection 
problem - parameter optimization for imaging ma- 
chines, which has items arranged as points on a multi- 
dimensional grid, with grid-structured Markov depen- 
dencies. In this application one dimension was "fil- 
ter color" , and another dimension was "focal length 
index". The utility of an item is based on features 
observed in each image, and we examine results of 
one case along just the focal length index dimension. 
The utility function is a hyperbolic tangent of the 
measured features. We assumed that "items values" 
were normally distributed, and the dependencies were 
roughly estimated from the data, measurement vari- 
ance ~ 0.1 and cr^, w 0.2. 

Figure |3] shows a summary of measurements made the 
blinkered scheme vs. pure myopic, where initial be- 
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Figure 3: Blinkered vs Myopic Measurements 

liefs (for most items - variance was approximately 0.8) 
due to some previous measurements resulted in prior 
knowledge (smaller variance in beliefs: approximately 
0.05) for focal length indices marked with black boxes. 
The pure myopic scheme measured only items with 
small variances, and eventually picked an inferior focus 
length index. The blinkered scheme performed differ- 
ent measurements, ending up in selecting the optimal 
index. 

These results are for one typical data set. Unfortu- 
nately for this problem, in experiments on real data, 
the result set is of necessity rather sparse, as it is dif- 
ficult to map the space of possibilities as was done 
for simulated data. Such an exploration would require 
us to predict, e.g. what would have happened had 
the item value been different? What would have been 
the result had we performed a measurement for (some 
unmeasured) item? Although the latter question was 
handled in our system by physically performing nu- 
merous measurements for all items, the former ques- 
tion is much more difficult to handle. 

5 RELATED WORK 

Limited rationality, a model of deliberation based on 
value of utility revision and deliberation time cost was 
introduced in . Notions of value of computation and 
its estimate were defined, as well as the class of meta- 
greedy algorithms and simplifying assumptions under 
which the algorithms are applicable. The theory of 
bounded optimality, on which the approach is based, 
is further developed in [TD] . [T2] employs limited ratio- 
nality techniques to analyze any-time algorithms and 
proves optimality of myopic algorithm monitoring un- 
der assumptions about the class of value and time cost 
functions. 

[B] consider a greedy algorithm for observation selec- 
tion based on value of observation. They show that 
when values of measurements for different items are 
mutually submodular and the measurement cost is 
fixed, the greedy algorithm is nearly optimal. The 



assumptions are inapplicable in our domain, necessi- 
tating an extension of the pure greedy approach in our 
case. 

In |2j , a case of discrete Bayesian networks with a sin- 
gle decision node is analyzed. The authors propose to 
consider subsequences of observations in the descend- 
ing order of their myopic value estimates. If any such 
subsequence has non-negative value estimate, then the 
computation with the greatest myopic estimate is cho- 
sen. However, this approach always chooses a mea- 
surement for the myopically best item, and when ap- 
plied to the selection problem either looks at sequences 
of measurements on a single item with the greatest 
myopic value estimate, or, if sequences with one mea- 
surement per item are considered, does not provide an 
improvement over the myopic estimate, for our patho- 
logical example. Still, in may cases their scheme shows 
an improvement in performance. [7] describes and ex- 
perimentally analyzes an algorithm for influence dia- 
grams based on a non-myopic VOI estimate. 

Multi-armed bandits bear similarity to the mea- 
surement selection problem, in particular, when the 
reward distribution is continuous and unknown. Some 
of the algorithms, e.g. POKER (Price of Knowledge 
and Estimated Reward) employ the notion of value 
of information. However, most solutions concentrate 
on exploitation of particular features of the value func- 
tion, such as linear dependence of reward from pushing 
a lever on the time left, and do not facilitate generaliza- 
tion. On the other hand, achievements in limited ra- 
tionality techniques should be helpful in development 
of improved solutions in this domain. 

6 CONCLUSION 

We have introduced a new "semi-myopic" value of 
information framework. An instance of semi-myopic 
scheme, called the blinkered scheme, was introduced, 
and demonstrated to have positive impact on solving 
the selection problem. Theoretical analysis of special 
cases provides some insights. Empirical evaluation of 
the blinkered scheme on simulated data shows that 
it is promising both for independent and for weakly 
dependent items. A limited evaluation of an actual 
application also indicates that the blinkered scheme is 
useful. 

Still, properties of the estimate have been investigated 
only partially for the dependent case, which is of more 
practical importance. In particular, when, due to suf- 
ficiently strong dependencies, observations in different 
locations are not mutually submodular, the blinkered 
estimate alone may not prevent premature termination 
of the measurement plan, and its combination with the 
approach proposed in [2j may be worthwhile. 



During the algorithm analysis, several assumptions 
have been made about the shape of utility functions 
and belief distributions. Certain special cases, such as 
normally distributed beliefs and convex utility func- 
tions, are frequently met in applications and may lead 
to stronger bounds and discovery of additional features 
of semi-myopic schemes. 

An important application area of the selection prob- 
lem, parameter optimization, has items arranged 
as points on a multi-dimensional grid, with grid- 
structured Markov dependencies. This special case has 
been partially investigated and requires future work. 
Extending this case to points on a continuous grid 
should also have numerous practical applications. 
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